Estimating the 6D pose of objects is one of the major fields in 3D computer vision. Since the promising outcomes from instance-level pose estimation, the research trends are heading towards category-level pose estimation for more practical application scenarios. However, unlike well-established instance-level pose datasets, available category-level datasets lack annotation quality and provided pose quantity. We propose the new category level 6D pose dataset HouseCat6D featuring 1) Multi-modality of Polarimetric RGB+P and Depth, 2) Highly diverse 194 objects of 10 household object categories including 2 photometrically challenging categories, 3) High-quality pose annotation with an error range of only 1.35 mm to 1.74 mm, 4) 41 large scale scenes with extensive viewpoint coverage, 5) Checkerboard-free environment throughout the entire scene. We also provide benchmark results of state-of-the-art category-level pose estimation networks.
translated by 谷歌翻译
6多机器人抓钩是一个持久但未解决的问题。最近的方法利用强3D网络从深度传感器中提取几何抓握表示形式,表明对公共物体的准确性卓越,但对光度化挑战性物体(例如,透明或反射材料中的物体)进行不满意。瓶颈在于这些物体的表面由于光吸收或折射而无法反射准确的深度。在本文中,与利用不准确的深度数据相反,我们提出了第一个称为MonograspNet的只有RGB的6-DOF握把管道,该管道使用稳定的2D特征同时处理任意对象抓握,并克服由光学上具有挑战性挑战的对象引起的问题。 MonograspNet利用关键点热图和正常地图来恢复由我们的新型表示形式表示的6-DOF抓握姿势,该表示的2D键盘具有相应的深度,握把方向,抓握宽度和角度。在真实场景中进行的广泛实验表明,我们的方法可以通过在抓住光学方面挑战的对象方面抓住大量对象并超过基于深度的竞争者的竞争成果。为了进一步刺激机器人的操纵研究,我们还注释并开源一个多视图和多场景现实世界抓地数据集,其中包含120个具有20m精确握把标签的混合光度复杂性对象。
translated by 谷歌翻译
光有许多可以通过视觉传感器被动测量的特性。色带分离波长和强度可以说是单眼6D对象姿态估计的最常用的波长。本文探讨了互补偏振信息的互补信息,即光波振荡的方向,可以影响姿态预测的准确性。一种混合模型,利用数据驱动的学习策略共同利用物理代理,并在具有不同量的光度复杂度的物体上进行设计和仔细测试。我们的设计不仅显着提高了与光度 - 最先进的方法相关的姿态精度,而且还使对象姿势估计用于高反射性和透明的物体。
translated by 谷歌翻译
间接飞行时间(I-TOF)成像是由于其小尺寸和价格合理的价格导致移动设备的深度估计方式。以前的作品主要专注于I-TOF成像的质量改进,特别是固化多路径干扰(MPI)的效果。这些调查通常在特定约束的场景中进行,在近距离,室内和小环境光下。令人惊讶的一点工作已经调查了现实生活场景的I-TOF质量改善,其中强烈的环境光线和远距离由于具有限制传感器功率和光散射而导致的诱导射击噪声和信号稀疏引起的困难。在这项工作中,我们提出了一种基于新的学习的端到端深度预测网络,其噪声原始I-TOF信号以及RGB图像基于涉及隐式和显式对齐的多步方法来解决它们的潜在表示。预测与RGB视点对齐的高质量远程深度图。与基线方法相比,我们在挑战真实世界场景中测试了挑战性质场景的方法,并在最终深度地图上显示了超过40%的RMSE改进。
translated by 谷歌翻译
According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.
translated by 谷歌翻译
Supervision for metric learning has long been given in the form of equivalence between human-labeled classes. Although this type of supervision has been a basis of metric learning for decades, we argue that it hinders further advances of the field. In this regard, we propose a new regularization method, dubbed HIER, to discover the latent semantic hierarchy of training data, and to deploy the hierarchy to provide richer and more fine-grained supervision than inter-class separability induced by common metric learning losses. HIER achieved this goal with no annotation for the semantic hierarchy but by learning hierarchical proxies in hyperbolic spaces. The hierarchical proxies are learnable parameters, and each of them is trained to serve as an ancestor of a group of data or other proxies to approximate the semantic hierarchy among them. HIER deals with the proxies along with data in hyperbolic space since geometric properties of the space are well-suited to represent their hierarchical structure. The efficacy of HIER was evaluated on four standard benchmarks, where it consistently improved performance of conventional methods when integrated with them, and consequently achieved the best records, surpassing even the existing hyperbolic metric learning technique, in almost all settings.
translated by 谷歌翻译
Steering language generation towards objectives or away from undesired content has been a long-standing goal in utilizing language models (LM). Recent work has demonstrated reinforcement learning and weighted decoding as effective approaches to achieve a higher level of language control and quality with pros and cons. In this work, we propose a novel critic decoding method for controlled language generation (CriticControl) that combines the strengths of reinforcement learning and weighted decoding. Specifically, we adopt the actor-critic framework to train an LM-steering critic from non-differentiable reward models. And similar to weighted decoding, our method freezes the language model and manipulates the output token distribution using called critic, improving training efficiency and stability. Evaluation of our method on three controlled generation tasks, namely topic control, sentiment control, and detoxification, shows that our approach generates more coherent and well-controlled texts than previous methods. In addition, CriticControl demonstrates superior generalization ability in zero-shot settings. Human evaluation studies also corroborate our findings.
translated by 谷歌翻译
Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way from unraveling the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.
translated by 谷歌翻译
Recent studies have proposed a unified user modeling framework that leverages user behavior data from various applications. Most benefit from utilizing users' behavior sequences as plain texts, representing rich information in any domain or system without losing generality. Hence, a question arises: Can language modeling for user history corpus help improve recommender systems? While its versatile usability has been widely investigated in many domains, its applications to recommender systems still remain underexplored. We show that language modeling applied directly to task-specific user histories achieves excellent results on diverse recommendation tasks. Also, leveraging additional task-agnostic user histories delivers significant performance benefits. We further demonstrate that our approach can provide promising transfer learning capabilities for a broad spectrum of real-world recommender systems, even on unseen domains and services.
translated by 谷歌翻译
While witnessing the noisy intermediate-scale quantum (NISQ) era and beyond, quantum federated learning (QFL) has recently become an emerging field of study. In QFL, each quantum computer or device locally trains its quantum neural network (QNN) with trainable gates, and communicates only these gate parameters over classical channels, without costly quantum communications. Towards enabling QFL under various channel conditions, in this article we develop a depth-controllable architecture of entangled slimmable quantum neural networks (eSQNNs), and propose an entangled slimmable QFL (eSQFL) that communicates the superposition-coded parameters of eS-QNNs. Compared to the existing depth-fixed QNNs, training the depth-controllable eSQNN architecture is more challenging due to high entanglement entropy and inter-depth interference, which are mitigated by introducing entanglement controlled universal (CU) gates and an inplace fidelity distillation (IPFD) regularizer penalizing inter-depth quantum state differences, respectively. Furthermore, we optimize the superposition coding power allocation by deriving and minimizing the convergence bound of eSQFL. In an image classification task, extensive simulations corroborate the effectiveness of eSQFL in terms of prediction accuracy, fidelity, and entropy compared to Vanilla QFL as well as under different channel conditions and various data distributions.
translated by 谷歌翻译